Monte Carlo methods are computational techniques that use random sampling to estimate numerical results for problems that are otherwise hard to solve exactly. They are especially useful when dealing with systems that involve uncertainty or complexity that makes traditional calculations difficult. The method was applied in the Manhattan Project to simulate neutron diffusion and nuclear reactions. The term “Monte Carlo” originates from the Monte Carlo Casino in Monaco, highlighting the reliance on chance and randomness.
Monte Carlo methods rely on random sampling to obtain numerical results for deterministic problems:
\[ \text{Estimate} = \frac{1}{N} \sum_{i=1}^{N} f(X_i) \]
Where:-
\(N\) = number of samples \(X_i\) = random variables \(f(X_i)\) = function of interest
Law of Large Numbers: As \(N \to \infty\), the sample mean converges to the expected value:
\[ \lim_{N\to\infty} \frac{1}{N} \sum_{i=1}^{N} f(X_i) = \mathbb{E}[f(X)] \]
Central Limit Theorem: The sampling distribution approaches normal distribution:
\[ \sqrt{N}(\bar{X}_N - \mu) \xrightarrow{d} N(0, \sigma^2) \] Monte Carlo relies on the principle of estimating deterministic quantities via random sampling:
Example formula for π approximation:
\[ \pi \approx 4 \times \frac{\text{Number of points inside the circle}}{\text{Total number of points}} \]
Monte Carlo can estimate π by randomly sampling points in a unit square and counting how many fall inside a quarter circle.
Conceptual Steps:
# Define different sample sizes to test Monte Carlo estimation accuracy
Samples <- c(1000, 10000, 100000, 1000000)
# Assign variable names for readability
s_1 <- Samples[1] # 1,000 samples
s_2 <- Samples[2] # 10,000 samples
s_3 <- Samples[3] # 100,000 samples
s_4 <- Samples[4] # 1,000,000 samples
# Set random seed for reproducibility (ensures same random numbers each run)
set.seed(123)
# Choose the number of random points to generate
n_points <- s_1
# Generate n_points random x and y coordinates uniformly between -1 and 1
x <- runif(n_points, -1, 1)
y <- runif(n_points, -1, 1)
# Check whether each (x, y) point lies inside the circle (x² + y² ≤ 1)
inside <- ((x^2 + y^2) <= 1)
# Create a data frame for visualization or analysis
df <- data.frame(x, y, inside)
# Estimate the value of π using Monte Carlo method:
# ratio of points inside the circle to total points × 4 (area ratio)
pi_est <- 4 * sum(inside) / n_pointsKey points:
# MONTE CARLO SIMULATION FUNCTION
montecarlo_option <- function(S0, K, r, sigma, T, n_paths, n_steps) {
dt <- T / n_steps # Time step size
# Initialize matrix to store stock price paths
paths <- matrix(0, nrow = n_paths, ncol = n_steps + 1)
paths[, 1] <- S0
# Generate standard normal random numbers
z_matrix <- matrix(rnorm(n_paths * n_steps), nrow = n_paths, ncol = n_steps)
# Simulate stock paths using Geometric Brownian Motion
for (j in 2:(n_steps + 1)) {
paths[, j] <- paths[, j - 1] * exp(
(r - 0.5 * sigma^2) * dt + sigma * sqrt(dt) * z_matrix[, j - 1]
)
}
# Calculate option payoffs
call_payoff <- pmax(paths[, n_steps + 1] - K, 0)
put_payoff <- pmax(K - paths[, n_steps + 1], 0)
call_price <- exp(-r * T) * mean(call_payoff)
put_price <- exp(-r * T) * mean(put_payoff)
return(list(
call_price = call_price,
put_price = put_price,
paths = paths
))
}
# SIMULATION PARAMETERS
S0 <- 100 # Initial stock price
K <- 105 # Strike price
r <- 0.05 # Risk-free interest rate (annual)
sigma <- 0.2 # Volatility (annual)
T <- 1 # Time to maturity (years)
n_paths <- 1000 # Number of Monte Carlo paths
n_steps <- 252 # Number of time steps
set.seed(123) # For reproducibility
# Run the simulation
option_result <- montecarlo_option(S0, K, r, sigma, T, n_paths, n_steps)| Parameter | Value |
|---|---|
| Call Option Price | 8.591995 |
| Put Option Price | 7.464224 |
Monte Carlo simulations are applied wherever complex systems involve uncertainty or stochastic behavior. Key use cases include:
Benford’s Law, also known as the First-Digit Law, describes the frequency distribution of leading digits in many naturally occurring datasets. Contrary to intuition, lower digits occur more frequently as the first digit. Specifically, the digit 1 appears about 30% of the time, while higher digits (e.g., 9) appear less than 5% of the time. Applications of Benford’s Law include fraud detection in accounting, forensic analysis, and data validation, as deviations from the expected distribution may indicate manipulation or anomalies. The law is named after physicist Frank Benford, who formalized it in 1938, though it was first observed by Simon Newcomb in 1881.
Benford’s Law predicts the probability of each digit \(d\) (1–9) as the first significant digit:
\[ P(D = d) = \log_{10}\left(1 + \frac{1}{d}\right), \quad d \in \{1,2,\dots,9\} \]
Where:-
\(D\) = first significant
digit
\(P(D=d)\) = probability of digit \(d\) occurring as the first digit
| Digit | Probability |
|---|---|
| 1 | 0.301 |
| 2 | 0.176 |
| 3 | 0.125 |
| 4 | 0.097 |
| 5 | 0.079 |
| 6 | 0.067 |
| 7 | 0.058 |
| 8 | 0.051 |
| 9 | 0.046 |
To check whether a dataset follows Benford’s Law, a Chi-square goodness-of-fit test is often used:
\[ \chi^2 = \sum_{d=1}^{9} \frac{(O_d - E_d)^2}{E_d} \]
Where:-
\(O_d\) = observed frequency of
digit \(d\) in the dataset
\(E_d = P(D=d) \cdot N\) = expected
frequency based on Benford’s Law
\(N\) = total number of
observations
Interpretation:
- If \(\chi^2\) is below the critical
value for 8 degrees of freedom (since 9 digits − 1), the dataset
conforms to Benford’s Law.
- A significantly large \(\chi^2\)
indicates deviation, potentially signaling anomalies or fraud.
# Create a Dataframe from the data
payroll_df <- read.csv("/home/vinayak/payroll_data.csv", stringsAsFactors = FALSE)
# Check first few rows of the data
head(payroll_df)## EmployeeID Salary
## 1 E0001 12799.80
## 2 E0002 36267.33
## 3 E0003 25374.23
## 4 E0004 10371.87
## 5 E0005 16493.14
## 6 E0006 50297.88
# Create a Benford object for the Salary column (2nd column)
bf_salary <- benford(payroll_df$Salary, number.of.digits = 1)
# Plot
plot(bf_salary)| Digit | Actual Count | Actual (%) | Benford (%) | Difference (%) |
|---|---|---|---|---|
| 1 | 75 | 37.5 | 30.10 | 7.40 |
| 2 | 57 | 28.5 | 17.61 | 10.89 |
| 3 | 27 | 13.5 | 12.49 | 1.01 |
| 4 | 17 | 8.5 | 9.69 | -1.19 |
| 5 | 7 | 3.5 | 7.92 | -4.42 |
| 6 | 6 | 3.0 | 6.69 | -3.69 |
| 7 | 2 | 1.0 | 5.80 | -4.80 |
| 8 | 5 | 2.5 | 5.12 | -2.62 |
| 9 | 4 | 2.0 | 4.58 | -2.58 |
Benford’s Law is applied wherever naturally occurring numeric data may reveal patterns or anomalies. Key use cases include:
The Altman Z-Score is a financial metric used to predict the likelihood of a company entering bankruptcy within the next 2 years. Developed by Edward I. Altman in 1968, it combines multiple financial ratios using a weighted linear combination.
The Z-Score is essentially a discriminant function from multivariate statistics, which separates financially distressed firms from healthy firms based on financial ratios.
\[ Z = 1.2X_1 + 1.4X_2 + 3.3X_3 + 0.6X_4 + 1.0X_5 \] \[ \begin{align*} X_1 &= \text{Working Capital / Total Assets} \\ X_2 &= \text{Retained Earnings / Total Assets} \\ X_3 &= \text{EBIT / Total Assets} \\ X_4 &= \text{Market Value of Equity / Total Liabilities} \\ X_5 &= \text{Sales / Total Assets} \end{align*} \]
Linear Combination of Ratios: Each ratio is multiplied by a weight reflecting its relative importance in predicting bankruptcy.
Threshold-based Classification: The resulting Z-Score is compared against predefined thresholds to categorize company risk.
Statistical Background: Altman used multiple discriminant analysis (MDA) to empirically determine the weights of financial ratios using historical company data.
Years <- 2015:2024
data <- data.frame(
Year = Years,
Working_Capital = c(366.238602998865,323.808647420777,343.087078290206,321.563793374437,259.064022551688,286.40312279824,211.193492518817,146.543957821955,120.787468956851,99.1268239420409),
Total_Assets = c(1188.90726976097,1170.68971353583,1107.21158352681,1138.64022530615,907.384105352685,1043.33879132755,1127.53786125686,964.922380750068,995.454302290455,969.487735605799),
Retained_Earnings = c(231.951533932914,296.766739226545,321.318004328533,250.275203271019,182.746136600687,194.249059878592,171.898617828878,146.832374171086,115.195439756733,166.633505768519),
EBIT = c(104.271432117896,172.407375222222,257.541292546592,100.205467459213,157.800385275683,113.592141257633,79.931630147092,108.317993085784,114.87317938569,63.7215852151974),
Market_Value_Equity = c(986.25972635745,683.829161610882,884.281041977552,674.475781725616,823.487712715455,742.985139989434,703.76548497861,478.018207698149,451.725597007781,335.168616938647),
Total_Liabilities = c(801.790063455701,751.688452623785,784.072960540652,500.249909330159,690.126629639417,588.047554064542,651.926615089178,745.108401309699,640.719163697213,544.454169739038),
Sales = c(1069.07219862906,1373.52237644098,1264.8627313343,1296.5789629515,738.156919997101,1038.1229145616,1137.74948569075,736.237093154764,599.024686413332,385.314262003617)
)
# ---- Compute Z-Score ----
data <- data %>%
mutate(
Z_Score = 1.2 * (Working_Capital / Total_Assets) +
1.4 * (Retained_Earnings / Total_Assets) +
3.3 * (EBIT / Total_Assets) +
0.6 * (Market_Value_Equity / Total_Liabilities) +
1.0 * (Sales / Total_Assets),
Zone = case_when(
Z_Score < 1.8 ~ "Distress",
Z_Score < 3.0 ~ "Grey Zone",
TRUE ~ "Safe"
)
)Altman Z-Score is widely applied in: